With the rapidgrowthof e-commerceplatformsanddigitalservices,customerreviewshavebecomeavaluable source of informationforunderstandingusersentimentandimprovingproductquality.Traditionalsentimentanalysisapproachesprimarilyfocusonclassifyingreviewsintosentimentcategoriesbutoftenfailtoprovide contextual insightsor relatedcustomer experiences. To overcomethislimitation,thispaperproposesaRetrieval- Augmented Generation (RAG) enhancedsentimentanalysis system thatcombinesTransformer-basedsentiment classification with semantic similarity retrieval.
The proposedsystem utilizes a fine-tunedTransformer model for sentiment predictionandHugging Face sentence embeddings to convert customer reviewsinto vector representations.Theseembeddings areindexedusingFacebook AISimilarity Search (FAISS)to enable efficientretrievalofsemantically similarreviews. Additionally, a Streamlit- based interactive webapplicationis developedto visualizesentiment predictions, confidence scores,similar review retrieval, andwordcloudrepresentations.
Experimental results demonstrate that the integration of sentiment classification with retrieval-based techniques enhances contextual understanding and interpretability of customer opinions. The proposed system is scalable, efficient, and suitable forreal-world applications such as customer feedback analysis, business intelligence, and decision support systems.
Introduction
Customer reviews are a critical source of insight for businesses, but the large volume of user-generated text makes manual analysis impractical, driving the need for automated sentiment analysis. Traditional machine learning methods (e.g., Naïve Bayes, SVM, Logistic Regression) and early deep learning models improved basic sentiment classification but struggled with complex language, long-range context, and interpretability. Transformer-based models such as BERT, RoBERTa, and DistilBERT significantly enhanced sentiment prediction by capturing contextual semantics, yet most systems still provide only sentiment labels without supporting contextual insights.
To address this limitation, the study proposes a Retrieval-Augmented Generation (RAG)–enhanced sentiment analysis framework that combines Transformer-based sentiment classification with FAISS-based semantic similarity retrieval. The system first predicts sentiment polarity and confidence scores using a fine-tuned Transformer model, then retrieves semantically similar customer reviews using sentence embeddings indexed in FAISS. This integration enriches sentiment outputs with relevant contextual evidence, improving interpretability and practical decision-making.
The framework follows a modular architecture including data ingestion, preprocessing, sentiment analysis, semantic retrieval, and deployment via a Streamlit web application. The system supports real-time sentiment prediction, retrieval of similar reviews, confidence visualization, and word-cloud analysis. Experimental results on real-world e-commerce review data demonstrate effective sentiment classification, fast and accurate retrieval, and improved explainability. Overall, the proposed RAG-based approach offers a scalable and practical solution for intelligent customer feedback analysis and business intelligence applications.
Conclusion
This paper presented an intelligent sentiment analysis system integrated with Retrieval-Augmented Generation (RAG) to enhance the interpretation of user reviews. By combining a Transformer-basedsentiment classification model with FAISS-powered semantic similarity search, the system not only predicts sentiment polarity but also providescontextual supportthrough relevanthistoricalrememberfutureset.
ThedeploymentoftheproposedapproachusingaStreamlit-basedwebinterfacedemonstratesitspractical applicabilityforreal-timecustomer feedback analysis.Theinclusionofvisualanalyticssuchasconfidence indicators and word cloud representationsfurther improves interpretabilityanduser engagement.Inthefuture,the system can beenhancedby incorporatinglarger multilingualdatasets, advanced transformer architectures, andreal- time data streaming from social media platforms. Additionally, integratingexplainableAI(XAI)techniquesand fine-grainedaspect-basedsentimentanalysiscouldfurther improvetransparency andanalyticaldepth
References
[1] Y.Liu,M.Ott,N.Goyal,etal.,“RoBERTa:ARobustlyOptimizedBERTPretraining Approach,”arXiv preprintarXiv:1907.11692,2019.
[2] J.Devlin,M.-W.Chang,K.Lee,andK.Toutanova,“BERT:Pre-trainingofDeepBidirectionalTransformers for LanguageUnderstanding,”ProceedingsofNAACL-HLT,pp.4171–4186,2019.
[3] J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with GPUs,” IEEE Transactions on Big Data, vol. 7, no. 3,pp.535–547, 2021.[4] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings usingSiamese BERT-Networks,” Proceedings of the2019Conference onEmpirical Methods in Natural Language Processing(EMNLP),pp.3982–3992,2019.
[4] T. Wolf, L. Debut,V. Sanh, et al., “Transformers: State-of-the-Art Natural Language Processing,”Proceedings ofthe 2020 Conferenceon EmpiricalMethodsin NaturalLanguage Processing: SystemDemonstrations,pp.38–45, 2020.
[5] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,”AAAI WorkshoponLearningforTextCategorization,1998.
[6] Streamlit Inc., “Streamlit: A Framework for Building Data Science Applications,” 2023.OnlineOnlineOnline. Available: https://streamlit.io